Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer

Abstract. Significance When developing a new quantitative optoacoustic computed tomography (OAT) system for diagnostic imaging of breast cancer, objective assessments of various system designs through human trials are infeasible due to cost and ethical concerns. In prototype stages, however, different system designs can be cost-efficiently assessed via virtual imaging trials (VITs) employing ensembles of digital breast phantoms, i.e., numerical breast phantoms (NBPs), that convey clinically relevant variability in anatomy and optoacoustic tissue properties. Aim The aim is to develop a framework for generating ensembles of realistic three-dimensional (3D) anatomical, functional, optical, and acoustic NBPs and numerical lesion phantoms (NLPs) for use in VITs of OAT applications in the diagnostic imaging of breast cancer. Approach The generation of the anatomical NBPs was accomplished by extending existing NBPs developed by the U.S. Food and Drug Administration. As these were designed for use in mammography applications, substantial modifications were made to improve blood vasculature modeling for use in OAT. The NLPs were modeled to include viable tumor cells only or a combination of viable tumor cells, necrotic core, and peripheral angiogenesis region. Realistic optoacoustic tissue properties were stochastically assigned in the NBPs and NLPs. Results To advance optoacoustic and optical imaging research, 84 datasets have been released; these consist of anatomical, functional, optical, and acoustic NBPs and the corresponding simulated multi-wavelength optical fluence, initial pressure, and OAT measurements. The generated NBPs were compared with clinical data with respect to the volume of breast blood vessels and spatially averaged effective optical attenuation. The usefulness of the proposed framework was demonstrated through a case study to investigate the impact of acoustic heterogeneity on OAT images of the breast. Conclusions The proposed framework will enhance the authenticity of virtual OAT studies and can be widely employed for the investigation and development of advanced image reconstruction and machine learning-based methods, as well as the objective evaluation and optimization of the OAT system designs.


Introduction
Optoacoustic computed tomography (OAT), also known as photoacoustic computed tomography, is a non-invasive imaging modality being actively developed for clinical breast imaging and other biomedical applications. [1][2][3][4][5][6][7][8] A unique feature of OAT is the ability to produce an image based on the endogenous optical contrast associated with chromophore concentrations and oxygenation states within tissue, without ionizing radiation and the loss of spatial resolution typically related to purely optical techniques such as diffuse optical tomography. 1,9 This permits the imaging of tissue metabolism and angiogenesis, which have been identified to play a critical role in tumor growth and progression. 7,10 Therefore, optoacoustic imaging is ideally positioned to resolve these two hallmarks of cancer in vivo. [2][3][4][5][6][7][8]10 As such, an optimized and validated OAT system can be a powerful tool for the management of breast cancer. By assessing the tumor microvasculature density and blood oxygenation, it can enable the initial evaluation of tumor aggressiveness to inform the treatment plan and prognosis. It also allows for the monitoring of tumor response to treatment over time. However, to realize its full diagnostic potential, OAT should be equipped with the capability of providing quantitative information on true values of the optical absorption coefficient, which is proportional to molecular concentrations. 7,11,12 A large number of different system designs for three-dimensional (3D) breast OAT that deploy varying light delivery and acoustic detection strategies have been proposed. [3][4][5][6]13 This is unlike in X-ray mammography, breast magnetic resonance imaging (MRI), and breast ultrasound, in which similar implementations are typically in use per modality. Ideally, candidate designs of new quantitative OAT systems would be evaluated based on clinically relevant objective image quality measures via human subject studies. However, this is not a feasible solution given the large space of possible (technical trade-offs considered) design parameters of OAT imaging systems, the large variety in breast sizes and compositions, and the cost and potential ethical concerns associated with such studies. Instead of clinical trials, virtual imaging trials (VITs), i.e., computer-simulation studies, have been advocated for assessment and optimization of system and algorithm designs in the early stages of technology development. For VITs to be clinically relevant, realistic numerical phantoms must be employed as the to-be-imaged objects. [14][15][16][17][18][19][20] Moreover and importantly, because imaging technologies are not typically optimized for use with only a single subject, ensembles of different phantoms that possess clinically relevant variability in anatomy and tissue properties must be virtually imaged. In this way, ensemble-averaged objective measures of image quality can be computed. 21 Several numerical breast phantoms (NBPs) and numerical lesion phantoms (NLPs) have been proposed. [16][17][18][19][20]22 However, most of the existing NBPs and NLPs were created from a limited number of clinical data, resulting in a lack of variability or oversimplified anatomical structures. [16][17][18][19] The virtual imaging clinical trials for regulatory evaluation (VICTRE) project at the U.S. Food and Drug Administration (FDA) 14 released software tools to generate realistic NBPs and NLPs for use in mammography applications. Although the VICTRE NBPs are considerably realistic for VITs of mammography in terms of principal tissue compositions, significant improvements are required for use in VITs of breast OAT. Bao et al. 20 reported optoacoustic NBPs based on the VICTRE tools. However, these phantoms do not introduce adaptations of the blood vascular network for use in VITs of OAT and do not account for physiological variability in tissue optoacoustic properties, the values of which are fixed and deterministic for each tissue type.
Although simulation tools for photon transport and acoustic wave propagation [23][24][25] in general media with spatially varying voxel-wise properties are available, the reported studies that employed NBPs did not leverage this capability. Not only in Ref. 20 but also in the other previous studies, [16][17][18][19]22 piecewise-constant optical properties were assigned to each tissue type in the NBPs and NLPs without taking into account spatial variability in the property distribution induced by oxygen transport between tissues. In addition, a correlation between optical properties and both chromophore concentrations and the oxygenation state of the tissue has not been addressed in the existing NBPs and NLPs. [16][17][18][19][20]22 Therefore, further enhancements are needed to establish physiologically realistic NBPs and NLPs for use in VITs of OAT.
In this work, an end-to-end framework for producing ensembles of realistic 3D anatomical, functional, optical, and acoustic NBPs and NLPs for use in VITs of quantitative OAT of breast cancer is established. The generation of the anatomical NBPs is accomplished by extending the VICTRE NBPs with blood vasculature modifications for use in VITs of OAT. Tissue composition within the anatomical NLPs is modeled according to different lesion types and aggressiveness. Depending on the type of cancer of interest, the lesions can contain a necrotic core and/or a tumor angiogenesis region surrounding the viable tumor cell region, which can be modeled because of the proposed modifications to the VICTRE NLPs. Realistic functional, optical, and acoustic properties are stochastically assigned for each breast tissue. The optical absorption coefficient is modeled based on the concentrations of the primary chromophores considered, which are stochastically chosen within their physiological ranges. Our modified version of the VICTRE tool code package, 26 which is a fork of the original software, has been released under the same creative commons zero license (CC0) used by the original project. Furthermore, the software framework to generate stochastic distributions of the functional, optical, and acoustic properties based on the modified VICTRE anatomical phantoms has been made publicly available under the GNU general public license version 3 (GPLv3). The software, named stochastic optoacoustic NBP (SOA-NBP), 27 is implemented in Python, and it includes capabilities for inserting superficial blood vasculature under the skin layer; creating anatomical NLPs; and assigning functional, optical, and acoustic properties to each breast tissue type. To enable researchers to immediately benefit from this work, 84 datasets were released under the CC0; these consist of anatomical, functional, optical, and acoustic NBPs and the corresponding simulated multi-wavelength optical fluence, initial pressure, and OAT measurement data. [28][29][30] The remainder of this article is organized as follows. Relevant background information including a description of the VICTRE and previous NBPs for OAT is provided in Sec. 2. The proposed framework is described in Sec. 3, and several examples of NBPs generated employing the proposed framework are presented in Sec. 4. A case study that investigates the impact of acoustic breast heterogeneity on image reconstruction quality is provided in Sec. 5, as an example of how the proposed stochastic phantoms can enable important studies. The article concludes with a summary and discussion in Sec. 6.

Background
2.1 VICTRE Project by the FDA The VICTRE project by the FDA 14 developed software tools to generate 3D numerical representations of human female breasts and lesions for use in simulating X-ray mammography applications. These tools can create ensembles of stochastic, anatomically realistic breast structures and lesions within a user-defined 3D volume by specifying breast density (i.e., fat fraction), shape, size, and tissue composition parameters. 14 The produced NBPs correspond to one of the following four types defined in the breast imaging reporting and data system (BI-RADS) 31 : (A) breast is almost entirely fatty, (B) breast has scattered areas of fibroglandular density, (C) breast is heterogeneously dense, and (D) breast is extremely dense. The tissue types in the VICTRE NBPs are fat, skin, glandular tissue, nipple, muscle, ligament, terminal duct lobular unit (TDLU), duct, artery, and vein. 14 The VICTRE tools can also generate NLPs based on microcalcification clusters or spiculated masses. 14 The NLPs are inserted at locations randomly selected from those predicted based on the duct and TDLU structures that are well-known sites for lesion formation. 32 There exist several challenges that must be addressed to extend the VICTRE project to produce NBPs for use in VITs of breast OAT technologies. To perform such VITs, NBPs and NLPs that describe the optical and acoustic properties of the breast and lesion need to be established; they should be stochastic in nature and describe realistic values under typical physiological and pathological conditions for each tissue type. Because many 3D OAT technologies are not handheld and utilize a fixed imaging geometry, the breast shape parameters need to be determined to be consistent with a prone position during a 3D OAT scan. [3][4][5][6] Additionally, the representation of blood vasculature in the NBPs generated using the VICTRE tools, albeit sufficiently realistic for mammography applications, 14 needs to be improved for realistic VITs of OAT because the VICTRE tool primarily produces deep-seated blood vessels rather than those near the skin layer that are dominantly exhibited in clinical OAT images. [3][4][5][6]33 This is consequential because OAT has greater sensitivity to blood vessels than other tissues. 1,9 2.2 Previous NBPs for OAT Several NBPs for OAT have been proposed, [16][17][18][19][20]22 and their features are summarized in Table 1. The NBPs in Refs. 17-19 have oversimplified structures, and only a handful of relatively thick blood vessels are included in the NBPs created via tissue segmentation from a limited number of X-ray mammography 22 and MRI images. 16 In Refs. 18, 19, and 22, a rounded-shaped NLP was inserted into the NBP, but the malignant tumors' characteristics revealed in OAT images, such as tumor hypoxia and angiogenesis, were not modeled.
Bao et al. 20 presented FDA's VICTRE NBPs with deterministic oxygen saturation and optical and acoustic properties assigned in a piecewise constant manner at two wavelengths of 700 and 900 nm. Also, the VICTRE NBPs were used without any modifications to the breast anatomy and shape, so the above mentioned challenges related to the blood vascular network and the patient's position during the OAT scan remain unmet by this phantom design. The assignment of piecewise constant optical properties has been commonly adopted in most previous studies on numerical phantoms for OAT. [16][17][18][19][20]22 However, this choice was mostly due to the need to reduce modeling and computational complexity. This limits the realism and usefulness of the generated optical phantoms. The optical absorption coefficient is determined by chromophore concentrations and hemoglobin oxygenation states within the tissue. 34,35 Because the oxygen carried by blood diffuses through the surrounding tissue, the oxygen saturation distribution, and thus the optical absorption coefficient distribution, spatially varies in tissues. In Ref. 20, the oxygenation levels were neither considered nor correlated to the optical properties, except for the arteries and veins. The levels were set with fixed values of 95% for the arteries and 75% for the veins despite their varied values within a certain range. 36 The optical properties at two wavelengths of 700 and 900 nm were directly assigned to the NBPs, and thus, these phantoms are unsuitable for VITs of breast multispectral OAT.

References
Phantom features 17 • 3D single breast tissue type and a cylindrical cyst • Optical absorption coefficient (μ a ), reduced scattering coefficient, scattering anisotropy (g), and refractive index (n) for a wavelength of 1064 nm; sound speed (c) and acoustic attenuation coefficient (α 0 ) with a power law exponent (y)

and 19
• 2D single breast tissue type and an elliptical 18 /circular 19 tumor • μ a and optical scattering coefficient (μ s ) for a wavelength of 800 nm; c and density (ρ) 19 16 a • 3D skin, fat, and fibroglandular tissues, as well as a handful of blood vessels, segmented from MRI images • μ a , μ s , g, and n for a wavelength of 760 nm; c and ρ

22
• 2D skin, fat, fibroglandular, and rounded tumor tissues, segmented from digital mammography • μ a and μ s for an unknown wavelength; c

20
• 3D NBPs of FDA's VICTRE without any modifications for healthy breast tissues, and lesion models of fibroadenoma, DCIS, and IBC • μ a , μ s , g, and n for wavelengths of 700 and 900 nm; c and ρ a Three datasets are publicly available.
Oval-shaped fibroadenoma and irregularly shaped ductal carcinoma in situ (DCIS) and invasive breast cancer (IBC) were modeled in Ref. 20 using the VICTRE tool. As modifications to the VICTRE NLPs, a few thin veins were inserted inside the DCIS mass, and a centripetal vein and a surrounding arterial area with its irregularly shaped boundary were added for IBC. However, the characteristics and diameters of the inserted vasculature did not account for realistic lesion growth. Specifically, the features of the inserted vasculatures were based on clinical data of aggressive DCIS and IBC lesions larger than or equal to 47 mm in diameter, 37 whereas the diameter of the simulated lesion was less than 10 mm. Furthermore, the method to generate the centripetal vein and the irregularly shaped arterial area was not explained in Ref. 20.

Methods
To circumvent the limitations described above, adaptations and customizations of the VICTRE NBPs were developed to enable the generation of large ensembles of realistic optoacoustic NBPs that exhibit clinically relevant variability in anatomical structures and functional, optical, and acoustic properties. The flow chart of optoacoustic NBP generation is illustrated in Fig. 1, and details of each step are explained in the following sections.

Generation of Realistic Anatomical NBPs and Lesion Insertion
In this first step, the construction of ensembles of realistic anatomical NBPs for different BI-RADS breast types [(A) breast is almost entirely fatty, (B) breast has scattered areas of fibroglandular density, (C) breast is heterogeneously dense, and (D) breast is extremely dense] using a modified version of the VICTRE tool is described. Additionally, the generation and insertion of anatomical NLPs into the NBPs are described.

Definition of breast size and shape
Based on clinical data and constraints by designs of the existing OAT breast imaging systems 3-6 (where a breast should fit within a scanning radius of 85 mm), the distributions of breast size parameters were determined for each BI-RADS breast type. 38 The VICTRE tool generates anatomical NBPs according to the parameters specified in the configuration file. Among the parameters, the breast volume extent parameters a 1t , a 1b , a 2l , a 2r , and a 3 and the breast shape parameters ϵ 1 , B 0 , B 1 , H 0 , and H 1 were specified in the VICTRE configuration file. These parameters are explained in Ref. 15, and the probability distributions for the parameters that are consistent with the patient's position during a 3D OAT scan are summarized in Table 2.
Here, a truncated Gaussian distribution (TN) was chosen among the possible distributions because it is the maximum entropy distribution supported in a bounded interval with a specified mean and standard deviation. 39 Fig. 1 Generation of optoacoustic NBPs.
The generated anatomical NBPs were discretized in three dimensions using a uniform Cartesian lattice. A voxel size of 0.125 mm was chosen for the discretization, considering computation time and memory usage. This choice allows for avoiding the discretization inverse crime when performing image reconstruction as the typical resolution in OAT breast images is between 0.2 and 0.5 mm. Each voxel value corresponds to an unsigned 8-bit integer tissue label. 14 The anatomical NBPs were rotated to be consistent with a prone position during a 3D OAT scan and cropped to exclude the chest muscle region.

Realizations of blood vasculature
A specific innovation in our adaptation is the introduction of realistic blood vasculature, as shown in Figs. 2(a) and 2(d). The VICTRE tool iteratively generates sibling and child branches of arterial and venous blood vessel trees within the breast volume, using randomly sampled values based on blood vessel parameters (Table 4) in the configuration file. 20 However, the tool assumes four arterial and five venous blood vessel trees with their entry locations predefined by fixed polar angles θ and a fixed distance from the breast edge d edge ¼ 20 mm [ Fig. 2(e)]. This is inconsistent with the anatomy of the breast. Furthermore, the values were hardcoded within the C++ source code of the VICTRE tool, making it inaccessible for users to adjust them through the configuration file. Thus, to more realistically model breast anatomy, 41 the source code of the VICTRE tool was modified to incorporate five sets of blood vessel trees: internal mammary, thoraco-acromial, lateral thoracic, subscapular and thoraco-dorsal, and intercostal arteries and veins [ Fig. 2(d)]. Two tunable parameters, vesselEdgeSep1 and vesselEdgeSep2 [d edge in Table 3], were added to the configuration file to set the distances from the breast edge (1) to the internal mammary, thoraco-acromial, lateral thoracic, and subscapular and thoraco-dorsal arteries and veins and (2) to the intercostal arteries and veins, respectively [ Fig. 2 Table 3 summarizes the entry locations of the blood vessel trees in the left breast, and those in the right breast were set to be bilaterally symmetric.
Among blood vessel tree, branch, and segment parameters in the VICTRE tool, the parameters maxBranch, initRad, minRadFrac, numTry, maxTry, and absMaxTry provided in Table 4 were tuned based on the breast volume percentage occupied by blood vessels estimated from four clinical OAT datasets. The clinical datasets used as references were acquired by TomoWave Laboratories (Houston, Texas, United States) using LOUISA-3D 3 at the MD Anderson Cancer Center. The experimental 3D OAT images were reconstructed using a filtered back-projection , the ϵ 1 value is set to "1," and the B 0 , B 1 , H 0 , and H 1 values are set to "0."   (FBP) method. 42 To compute the breast volume percentages occupied by blood vessels from the reconstructed images, the distributions of the total hemoglobin concentration were estimated by employing spectral linear unmixing with optical fluence normalization. 33 Then, a vessel enhancement filter 43 was applied to the estimated distributions of the total hemoglobin concentration [ Fig. 2(c)], and the ratio of the numbers of the blood vessel voxels and breast voxels was calculated.
The blood vessels within the depth of 10 mm under the skin layer are dominantly observed with relatively high image contrast in clinical OAT images, i.e., estimated distributions of the initial pressure, but are missing in the VICTRE NBPs [Figs. 2(b) and 2(e)]. To address this, additional blood vasculature [Figs. 2(a) and 2(d)] was stochastically generated using computationally efficient methods (random sampling, Gaussian blurring, Otsu's thresholding, 44 and skeletonization), which are commonly used in image processing. The diameter of the blood vessels and their depth from the skin are tunable and were set to 0.75 and 0.375 mm, respectively. Multiple blood vessel segments were formed and alternately divided into arteries and veins depending on the segment locations. Instead of the proposed method, another user-defined method can be employed in our proposed framework if desired. Details of the blood vasculature generation are provided in Appendix A.

Generation and insertion of lesions
Most common types of malignant lesions, such as invasive ductal carcinoma and invasive lobular carcinoma, have anatomically irregular outlines and develop in milk ducts and lobules. 32 Depending on the type of lesion and its aggressiveness, the lesion growth can be accompanied by severe hypoxia in the center region, tumor necrosis, and tumor angiogenesis in the periphery of the viable tumor cells. [45][46][47] Users can model lesions as being composed solely of a viable tumor cell (VTC) region or containing a necrotic core, VTC region, and peripheral angiogenesis (PA) region [ Fig. 3(a)]. [47][48][49] The irregular spiculated boundary of the VTC region was created employing the VICTRE tool. The necrotic core region and PA region were formed via erosion and dilation operations 50 applied to the surface of the VTC region using spherical structuring elements, i.e., matrices that identify the voxel and its neighboring voxels within a spherical area to be eroded or dilated, respectively. Here, the radii of the spherical structuring elements were 0.75 mm (the thickness of the ring-shaped VTC region) 49 for the erosion and 5 mm 48 for the dilation. The generated anatomical NLPs can be optionally inserted into the anatomical NBP. Among the candidate lesion locations predicted via the VICTRE tools, those that do not overlap with other (already inserted) lesions or the skin, nipple, and muscle were randomly chosen as the sites at which to insert the NLP.

Definition and Assignment of Functional, Optical, and Acoustic Properties
Functional, optical, and acoustic NBPs can be separately established via assignment of the specific properties of breasts to each tissue type in the anatomical NBPs described in Sec. 3.1. The optical contrast exhibited in OAT images is determined by light absorption by chromophores, and the concentration distributions and molar extinction coefficients of the chromophores determine the optical absorption distribution in the tissue. 34 The primary chromophores of the breast relevant to OAT are oxy-and deoxy-hemoglobin, water, fat, and melanin. The hemoglobin concentration in the tissue can be described by the total hemoglobin concentration in blood and the volume fraction of blood in the tissue, whereas the melanin concentration in the epidermis can be assumed to be proportional to the volume fraction of melanosome, an organelle where melanin is synthesized. 34 The volume fraction of the optical absorption coefficient of a pure chromophore can be a surrogate of its concentration in the tissue. 34,[51][52][53][54] In the proposed method, therefore, such functional properties were assigned to each tissue type in the anatomical NBPs, and then the corresponding optical properties were computed and assigned. Sections 3.2.1-3.2.3 elaborate on how the functional, optical, and acoustic properties were stochastically assigned to each tissue type. The prescribed statistical characteristics of the functional, optical, and acoustic properties were informed by a comprehensive literature survey to faithfully represent anatomically and physiologically realistic values and variations. 15,34-36,51-80

Stochastic assignment of functional properties
NBPs that describe the functional properties of the breast tissues without and with lesions were established as follows. The functional properties considered were the total hemoglobin concentration of blood c tHb;blood (μM); oxygen saturation s (%); and volume fractions of blood f b , water f w , fat f f , and melanosome f m (%). Although other chromophores exist, their contribution to the optical absorption coefficient at the wavelength in the near infrared (NIR) range (700 to 1100 nm) is negligible, so they are omitted. For this reason, the sum of the volume fractions of the primary Table 5 is always less than or equal to 100%. The value of c tHb;blood for each NBP was randomly sampled from a uniform distribution Uð1860; 2325Þ μM corresponding to a normal hematocrit level (36% to 44%) for women. 55 NBPs without lesions.  Table 5 provides the probability distributions of the functional properties defined for each tissue type. To mimic physiological spatial variation in the oxygen saturation distribution, the oxygen saturation in the tissues was modeled by solving a diffusion-reaction partial differential equation (PDE) 81 using FEniCS, 82 which is an open-source finite element library for solving PDEs. Specifically, the oxygen saturation s values sampled from the probability distributions in Table 5 were assigned to the specific tissues (skin, nipple, artery, and vein). Then the solution to the PDE was used to define the s distribution in the surrounding tissues (fat, ligament, TDLU, duct, and glandular tissues).
NBPs with lesions. For the NBPs containing malignant lesions, tumor hypoxia 11 was mimicked by assigning a relatively low s value sampled from the probability distribution in Table 5 to the voxels corresponding to the VTC region and necrotic core in the anatomical NBP [ Fig. 3(b)].   More details are provided in Appendix B. In addition to tumor hypoxia, other features of aggressive malignant lesions are (1) a relatively high total hemoglobin concentration, i.e., blood volume fraction f b , in the lesion and its peripheral region (tumor angiogenesis) compared with healthy tissues and (2) tumor necrosis. [45][46][47] For aggressive malignant lesions, the f b distribution was modeled to mimic tumor necrosis by assigning an f b value of 0% to the necrotic core. The capillaries newly sprouted from the vascularized lesions have diameters ranging from 3 to 40 μm 45 and, therefore, are too small to be geometrically resolved in the discretized NBP (a voxel size of 0.125 mm). For this reason, the presence of such capillaries was accounted for by assigning a spatially varying local increase in blood volume fraction f b within the VTC and PA regions [ Fig. 3(c)]. Additional details are presented in Appendix B.

Assignment of optical properties
NBPs that describe the optical properties of the breast tissues were established as follows. The optical properties considered were optical absorption coefficient μ a (mm −1 ), optical scattering coefficient μ s (mm −1 ), scattering anisotropy g, and refractive index n. Illumination wavelengths were selected from the NIR spectral range from 700 to 1100 nm, which is commonly used in OAT breast imaging. [3][4][5][6]13 The optical properties are wavelength-dependent; however, g and n do not vary significantly over the NIR range. 34 Thus, constant g and n values were defined for each tissue type regardless of the wavelength and were assigned to the corresponding tissue voxels, as presented in Table 6. For the PA region, the g and n values of its underlying tissues (fat/ligament/ TDLU/duct or glandular tissues) were assigned to the corresponding voxels.
The μ a value at r ¼ ðx; y; zÞ ∈ R 3 at a wavelength of λ was calculated based on the specified functional NBPs as 34 where I ¼ f oxygenated blood, deoxygenated blood, water, fat, melanosome g is a set of chromophores of interest. f i ðrÞ and μ a;i ðλÞ are the volume fraction at r and the optical absorption coefficient at a wavelength of λ of the pure chromophore indexed by i, respectively. The volume fractions of oxygenated blood f oxyblood ðrÞ ¼ f b ðrÞsðrÞ and deoxygenated blood f deoxyblood ðrÞ ¼ f b ðrÞf1 − sðrÞg were computed using the blood volume fraction f b ðrÞ and oxygen saturation sðrÞ defined in Sec. 3.2.1. The corresponding optical absorption coefficients were computed as μ a;oxyblood ðλÞ ¼ lnð10Þc tHb;blood ϵ HbO 2 ðλÞ for oxygenated blood and μ a;deoxyblood ðλÞ ¼ lnð10Þc tHb;blood ϵ Hb ðλÞ for deoxygenated blood, where ϵ HbO 2 and ϵ Hb are molar extinction coefficients (mm −1 M −1 ) of oxy-and deoxy-hemoglobin, respectively. The values of ϵ HbO 2 , ϵ Hb , and optical absorption coefficients of water, fat, and melanosome for the NIR range were from the data in Refs. 35 and 57-59.   Table 6. The μ s value of water at a wavelength of λ was assigned with an estimate from a curve based on Eq. (2) that fits to the measurements reported in Ref. 60. The VTC and necrotic core regions have a relatively high μ s value due to tumor cell proliferation, compared with healthy tissues 83 [see Table 6]. For the PA region in the NBP, the μ s values of the underlying tissues (fat/ligament/TDLU/duct or glandular tissues) were assigned to the corresponding voxels.

Stochastic assignment of acoustic properties
Acoustic NBPs for ultrasound computed tomography (USCT) were proposed by some of the authors in Ref. 15, with the tissues that are invisible in USCT imaging being excluded from consideration. For virtual OAT imaging, NBPs that describe the acoustic properties of the breast tissues were established as follows. The acoustic properties considered were sound speed c (mm∕μs), density ρ (g∕mm 3 ), and acoustic attenuation coefficient α 0 (dB∕MHz y mm) with power law exponent y. Table 7 provides the probability distributions of c, ρ, and α 0 for each tissue type. 15 For the PA region, similar to the assignment of μ s , g, and n values in Sec. 3.2.2, the acoustic properties of the underlying tissues (fat/ligament/TDLU/duct or glandular tissues) were assigned to the corresponding voxels. Several widely used time-domain wave propagation simulators assume a spatially homogeneous y 25 although the exponent y varies between 1 and 1.5 depending on the tissue type. 84 Accordingly, the homogeneous values of 1.1151, 1.1642, 1.2563, and 1.3635 were used for breast types A to D, respectively, as reported in Ref. 15. Once the Acoustic properties of water are consistent with an assumed temperature of 37°C, which is often used in breast OAT to minimize patient discomfort.
simulators start supporting spatially varying y distributions and tissue type-dependent y data becomes available, it will be possible to investigate the error induced by the spatially homogeneous y.

Examples of Generated NBPs and Corresponding OAT Images
To provide insight into the visual and quantitative characteristics of NBPs created by the proposed framework and the corresponding OAT images, example NBPs were produced as explained in Sec. 4.1. Subsequently, OAT measurement data were simulated based on a target imaging system described in Sec. 4.2, and 3D estimates of the induced pressure distributions were reconstructed using an FBP method.    Figure 5(b) shows box plots of the spatially averaged effective optical attenuation μ eff estimated from 40 proposed NBPs at the three wavelengths via the Beer-Lambert law-based estimation method in Ref. 33.

Examples of Generated NBPs
As shown in Fig. 5(a), the reference value (0.439%) of the breast volume percentage occupied by blood vessels, i.e., an estimated mean of the clinical OAT images (a dotted horizontal line), is within the interquartile range of the proposed NBPs (0.336% to 0.48%, median of 0.438%), but it is not within the interquartile range of the VICTRE NBPs (0.249% to 0.39%, median of 0.31%). In Fig. 5

Example OAT Images from Simulated Data
OAT images that depict the induced pressure distributions were reconstructed from simulated measurement data corresponding to the considered NBPs. The light delivery subsystem of the virtual OAT imaging system consisted of 20 arc-shaped illuminators uniformly distributed along the azimuthal angle. Each illuminator arc had a radius of 145 mm and was modeled using 25 cone beam sources as illustrated in Fig. 6. The acoustic detection subsystem consisted of 51,472 transducer elements uniformly distributed on a hemispherical aperture with a scanning radius of 85 mm. Each virtual transducer element recorded 3,720 time samples of pressure data at a sampling frequency of 20 MHz. Additional details on the virtual OAT imaging system are presented in Appendix C.
OAT data were virtually acquired via multiphysics simulation of the photoacoustic effect and subsequent wave propagation. First, the simulation of photon transport in biological tissues was conducted using the GPU-accelerated MCX version 1.9.0 23,24 to compute the induced initial pressure distribution at the three optical wavelengths of interest (757, 800, and 850 nm).  Next, wave propagation in acoustically heterogeneous media was simulated using the GPUaccelerated k-Wave toolbox version 1.3, 25 and pressure traces measured by the virtual transducer elements were recorded. Finally, measurement noise was modeled as independent and identically distributed Gaussian noise with zero mean and a standard deviation equal to 1% of the maximum optoacoustic signal strength of the entire ensemble for all three wavelengths, as was empirically determined based on the in vivo breast OAT data. 3,33 Using the simulated noisy acoustic measurements, the images, i.e., p 0 estimates, were reconstructed employing the FBP method with a voxel size of 0.25 mm. Figure 8 presents a visual comparison of 3D OAT images reconstructed from clinical measurement data and simulated measurement data produced using the proposed

Case Study: Acoustic Heterogeneity in Image Reconstruction
To demonstrate the usefulness of the proposed framework in virtual OAT studies, a case study was conducted to explore the impact of acoustic heterogeneity in OAT images of the breast. One phantom for each breast type (A, B, C, or D), four in total, was used for this case study. Acoustic measurements were simulated via virtual OAT data acquisition based on the target imaging system described in Sec. 4.2.
To explore the impact of acoustic heterogeneity in OAT images of the breast, the images, i.e., p 0 estimates, were reconstructed from the noisy acoustic measurements simulated in Sec. 4.2 with a voxel size of 0.25 mm. Three different approaches were used, each with the following assumptions: homogeneous acoustic properties of water in Table 7 for the entire computation domain (Approach 1); two sound speed values for water (c water ) and breast tissue (c breast ) each, and homogeneous density and acoustic attenuation properties of water for the whole domain (Approach 2); and the true distributions of acoustic properties (Approach 3). For image reconstruction, the LSQR 88 algorithm was implemented 89 to compute a least-squares estimate of p 0 . As the stopping rule, the iteration was terminated when the squared error between the true p 0 and estimated p 0 started to increase to avoid overfitting the noisy data. The goal of this case study was not to illustrate a practical image reconstruction algorithm but to investigate the effect of acoustic heterogeneity on image quality. Thus, a stylized stopping criterion, which requires knowledge of the true object, was purposely chosen to isolate the effect of acoustic heterogeneity from other sources of error or bias in OAT images, such as the design of an appropriate regularization functional. Figure 9 shows the true p 0 (first column) and the p 0 estimates (second to fourth columns) of Approaches 1 to 3, respectively. To tune c breast in Approach 2, sound speed values between 1.45 and 1.54 mm∕μs were swept with a step size of 0.005 mm∕μs, and the selected values for each breast type are presented in Table 8.
The relative squared error RSE ¼ kx true − x est k 2 2 ∕kx true k 2 2 and structural similarity index measure (SSIM) 90 between the true p 0 (x true ) and the reconstructed image (x est ) were calculated to quantify the accuracy of the reconstructed images. Table 8 provides the average RSE and SSIM calculated from the data of breast types A to D for all three wavelengths. The RSE and SSIM improved for all breast types when acoustic heterogeneity was accounted for in the image reconstruction. As shown in Fig. 9, the mitigation of clutter artifacts using the two-sound speed model (Approach 2) with respect to the homogeneous model (Approach 1) was more significant Fig. 8 Visual comparison of 3D OAT breast images reconstructed from (a) clinical measurement data and simulated measurement data produced using (b) the proposed NBP and (c) the unmodified VICTRE NBP. In panel (a), the clinical data were acquired by TomoWave Laboratories using LOUISA-3D. 3 In the unmodified VICTRE NBP, tissue properties were assigned to each tissue type in a piecewise constant manner. All three images were reconstructed using the FBP method. 42 The images were visualized using Paraview's volume rendering technique, which accumulates intensities based on the selected color and opacity maps. 40 in the type A breast (almost entirely fatty breast) than the type D breast (extremely dense breast). Here, clutter refers to background artifacts caused by uncompensated scattering or refraction effects. For the fatty type breast, the sound speed mismatch between fatty tissue and water is the main cause of clutter; therefore, the two-sound speed model (Approach 2) can effectively mitigate such artifacts. However, for denser type breasts, glandular tissue has a sound speed similar to water, and acoustic impedance heterogeneity across different breast tissues is the primary cause of clutter; thus the two-sound speed model is less effective. Fig. 9 True distributions of initial pressure p 0 at a wavelength of 800 nm (first column) and p 0 estimates reconstructed assuming homogeneous acoustic properties of water (second column), two sound speeds for water and breast tissue regions each and homogeneous density and acoustic attenuation properties of water (third column), and true distributions of acoustic properties (fourth column). The whole breast regions are presented from a side view in panels (a) and (c), and the regions at depths between 10 and 20 mm from the breast surface are illustrated from a front view in panels (b) and (d). The images in panels (a) and (b) are from a type A breast, and those in panels (c) and (d) are from a type D breast. Paraview 40 was used for volume rendering, and color maps were adjusted for better visibility.

Conclusion
In this work, a framework to generate realistic 3D NBPs that can be used in large-scale VITs of OAT breast imaging was established. For the first time, anatomically, physiologically, and optoacoustically realistic 3D NBPs and NLPs in varying sizes and shapes could be produced using the proposed framework. This was achieved by extending VICTRE tools to OAT imaging with substantial modifications. Because the proposed framework was implemented in a modular form, it facilitates further customization of the NBPs. Users can replace the relatively simple lesion model applied in this study with more biologically realistic models of tumor growth and metabolism. For example, heterogeneous distributions of multiple necrotic regions and blood vessels proliferating toward the lesion could be included. Future studies may also include explicit geometric modeling of vascular growth associated with tumors, functional and optical modeling of a specific type of breast cancer, analysis of lesion detectability as a function of their size and depth, and mechanical deformations of the breast to simulate slight breast compression in the existing OAT system. In summary, the produced NBPs and NLPs enhance the authenticity of virtual OAT studies, and the proposed framework can be widely employed for the investigation and development of advanced image reconstruction methods and machine learning-based methods, as well as the objective evaluation and optimization of the OAT breast imaging systems. Code packages of the proposed tools and 84 datasets that were made publicly available will enable researchers to immediately benefit from this work.

Appendix A: Generation of Blood Vasculature under the Skin Layer
Blood vasculature under the skin layer [Figs. 2(a) and 2(d)] was stochastically generated visually similar to the blood vessels observed in the clinical OAT images 3-6,33 via relatively computationally inexpensive image processing methods (random sampling, Gaussian blurring, Otsu's thresholding, 44 and skeletonization) that are commonly used. In Algorithm 1, the inputs N x , N y , and N z are the sizes of the anatomical NBP volume along the x-, y-, and z-axes; inputs σ 1 and σ 2 of Gaussian filters control the distance between blood vessel branches in the vessel tree; and input σ 3 determines the thickness of blood vessels under the skin layer. Based on the breast volume percentage occupied by blood vessels estimated from the clinical OAT images, the values of σ 1 , σ 2 , and σ 3 were set to 3.875, 6.125, and 0.219 mm (corresponding to the vessel diameter of 0.75 mm), respectively. The erosion in step (7) of Algorithm 1 shrinks bright regions in the breast mask without skin, i.e., voxels under the breast skin. Thus, the degree of erosion determines the distance between the inserted blood vasculature and the skin. For example, three erosion iterations at a voxel size of 0.125 mm yield the blood vessels at a depth of 0.375 mm under the skin. The proposed method does not produce a specific number and volume of arteries and veins under the skin layer. Instead, it stochastically creates visually plausible vascular structures employing computationally efficient methods, so the number and volume of the blood vessel segments produced in step (11) of Algorithm 1 are not always even for arteries and veins. The blood vessel segments were assigned to the artery mask and vein mask alternately, depending on the segment locations, in step (12) of Algorithm 1.
The produced blood vasculature was embedded into the anatomical NBPs by assigning the artery label to the artery mask region and the vein label to the vein mask region in the anatomical NBPs [Figs. 2(a) and 2(d)].

Appendix B: Modeling Distributions of Oxygen Saturation and Blood Volume Fraction
To model a spatially varying distribution of oxygen saturation sðrÞ at each spatial coordinate r within the phantom, the following diffusion-reaction PDE 81 was solved: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 8 ; 1 1 7 ; 4 7 3 −μΔsðrÞ þ X i∈T s χ i ðrÞðsðrÞ − s i Þ ¼ 0: Here, μ > 0 is a numerical parameter controlling the smoothness of the oxygen saturation sðrÞ; Δ is the Laplace operator; T s ¼ f skin, artery, vein, lesion g is a set of tissues of interest; χ i ðrÞ and s i are the indicator function and the target value of oxygen saturation, respectively, (randomly sampled according to the tissue-specific probability distributions in Table 5) assigned to the tissue i (i ∈ T s ).
Tumor angiogenesis was modeled in the blood volume fraction distribution f b ðrÞ at a spatial coordinate r within the phantom by solving the following diffusion-reaction PDE: E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 8 ; 1 1 7 ; 3 5 2 −μΔf b ðrÞ þ X i∈T b where T b ¼ f VTC, necrotic core, fat/ligament/TDLU/duct, glandular, artery, and vein g is a set of tissues of interest and f bi is the target value of the tissue blood volume fraction sampled from the predefined probability distributions in Table 5 for the tissue i (i ∈ T b ).
The PDEs above were solved using a linear finite element on an unstructured tetrahedral mesh that is fitted to tissue interfaces. Unstructured meshes were generated using the pygalmesh software, 91,92 and the PDE was solved using the parallel finite element library FEniCS. 82 9 Appendix C: Details of Virtual OAT Imaging System and Data Acquisition The light delivery mechanism and the measurement geometry of the target virtual OAT imaging system were chosen to emulate one of the existing OAT systems for breast imaging, LOUISA-3D 3,33 developed by TomoWave Laboratories (Houston, Texas, United States). The detailed illumination geometry of the virtual light delivery system using 20 arc-shaped illuminators [see Fig. 6] is described in Table 9.
For the virtual OAT data acquisition, idealized point-like transducers, uniformly distributed on the arc-shaped optoacoustic probe specified in Table 10, were assumed; the probe collects a total of 3,720 time samples of pressure data at a sampling frequency of 20 MHz. For the simulation of OAT data acquisition, the k-Wave GPU code was used as the acoustic wave propagation simulator. With the assumption of idealized point-like transducers, the possible option to define transducer locations in the simulation supported by the k-Wave GPU code is a binary matrix, where the value at the corresponding transducer locations is "1" and at the others is "0." Therefore, 51,472 transducer locations were defined as a target measurement geometry, excluding the overlaps due to the discretization of Cartesian coordinates into the binary matrix with the given voxel size.
Acoustic OAT measurements were virtually acquired through the simulation of photon transport in breast tissues, calculation of initial pressure distribution p 0 , and simulation of acoustic wave propagation. In the simulation of the photon transport using the GPU-accelerated MCX, the light source and direction parameters were set according to the target system design [ Fig. 6 and Table 9], and the simulation domain size was set to 340 × 340 × 170 voxels with a spatial step size of 0.5 mm. Among the optical NBPs employed as inputs of the simulation, the distributions μ a and μ s for wavelengths of 757, 800, and 850 nm were downsampled using linear interpolation (from a voxel size of 0.125 to 0.5 mm), and constant values g and n were averaged within breast tissues. With these inputs, the distributions of optical fluence ϕ were simulated using 10 8 photons per beam for a time duration of 5 ns.
The true p 0 ¼ Γμ a ϕ at a voxel size of 0.25 mm [ Fig. 7(b)] was calculated via elementwise multiplication of μ a and the simulated ϕ, assuming a constant Grüneisen parameter Γ ¼ 1, as is commonly done for soft tissues. 16,33 The μ a and the simulated ϕ were downsampled (from a voxel size of 0.125 to 0.25 mm) and upsampled (from a voxel size of 0.5 to 0.25 mm) via linear interpolation, respectively. The MCX simulation at a coarser grid accompanied by the upsampling not only is equivalent to smoothing the p 0 in the acoustic wave propagation simulation to reduce the amplitudes of the high spatial frequency components but also significantly reduces the computation time of the MCX simulation.
In the simulation of acoustic wave propagation using the k-Wave toolbox and its GPU code, 25 the measurement geometry was configured based on the target system design [ Table 10] as explained above. The voxel size was set to the smallest possible, i.e., 0.25 mm, given the measurement geometry and the use of an NVIDIA Tesla V100 GPU with 32 GB memory. The simulation domain size was set to 700 × 700 × 350 voxels. The acoustic NBPs c, ρ, and α 0 employed to simulate acoustic measurements were downsampled using linear interpolation (from a voxel size of 0.125 to 0.25 mm). An anisotropic absorbing boundary layer, known as a perfectly matched layer, was set to be outside the simulation domain to prevent  undesired acoustic reflection at the boundaries. A pressure wavefield generated by the simulated p 0 was virtually propagated based on the acoustic NBPs and measured by the virtual transducer elements in the target imaging system.

Disclosures
The author A. A. Oraevsky discloses ownership interest in TomoWave Laboratories, Inc. Other authors have no relevant financial interests and no potential conflicts of interest to disclose.

Data, Materials, and Code Availability
Eighty-four datasets (21 per breast type) have been publicly released on the Harvard Dataverse. 28,29,30 Each dataset includes a natural-28 or hemispherical-shaped anatomical NBP 29 with four NLPs inserted, which are composed solely of a VTC region, or a natural-shaped anatomical NBP 30 with two NLPs inserted, one with a necrotic core and PA region and the other without, as described in Sec. 3.1; distributions of functional (c tHb;blood , s, f b , f w , f f , and f m ), optical (μ a , μ s , g, and n), and acoustic properties (c, ρ, α 0 , and y ), as described in Sec. 3.2; and simulated multi-wavelength optical fluence, initial pressure, and OAT measurements, as explained in Sec. 5